Tracing Text Provenance via Context-Aware Lexical Substitution

نویسندگان

چکیده

Text content created by humans or language models is often stolen misused adversaries. Tracing text provenance can help claim the ownership of identify malicious users who distribute misleading like machine-generated fake news. There have been some attempts to achieve this, mainly based on watermarking techniques. Specifically, traditional methods embed watermarks slightly altering format line spacing and font, which, however, are fragile cross-media transmissions OCR. Considering natural represent replacing words in original sentences with synonyms from handcrafted lexical resources (e.g., WordNet), but they do not consider substitution’s impact overall sentence's meaning. Recently, a transformer-based network was proposed modifying unobtrusive function words), which also impair logical semantic coherence. Besides, one well-trained fails other different types content. To address limitations mentioned above, we propose scheme context-aware substitution (LS). employ BERT suggest LS candidates inferring relatedness between sentence. Based selection strategy terms synchronicity substitutability further designed test whether word exactly suitable for carrying watermark signal. Extensive experiments demonstrate that, under both objective subjective metrics, our well preserve integrity has better transferability than existing methods. approach outperforms state-of-the-art Stanford Word Substitution Benchmark.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Controlling Lexical Substitution in Computer Text Generation

Th=s report describes Paul, a computer text generation system desig~ed LO create cohesive text through the use o| lexlcal substitutions. Specihcally, Ihas system is designed Io determmistically choose between provluminahzat0on, superordinate suhstntut0on, and dehmte noun phrase reiterabon. The system identities a strength el antecedence recovery for each of the lex~cal subshtutions, and matches...

متن کامل

Flexible Provenance Tracing

The description of the origins of a piece of data and the transformations by which it arrived in a database is termed the data provenance. The importance of data provenance has already been widely recognized in database community. The two major approaches to representing provenance information use annotations and inversion. While annotation is metadata pre-computed to include the derivation his...

متن کامل

SW-AG: Local Context Matching for English Lexical Substitution

We present two systems that pick the ten most appropriate substitutes for a marked word in a test sentence. The first system scores candidates based on how frequently their local contexts match that of the marked word. The second system, an enhancement to the first, incorporates cosine similarity using unigram features. The core of both systems bypasses intermediate sense selection. Our results...

متن کامل

MELB-MKB: Lexical Substitution system based on Relatives in Context

In this paper we describe the MELB-MKB system, as entered in the SemEval-2007 lexical substitution task. The core of our system was the “Relatives in Context” unsupervised approach, which ranked the candidate substitutes by web-lookup of the word sequences built combining the target context and each substitute. Our system ranked third in the final evaluation, performing close to the top-ranked ...

متن کامل

Transparently Gathering Provenance with Provenance Aware Condor

We observed that the Condor batch execution system exposes a lot of information about the jobs that run in the system. This observation led us to explore whether this system information could be used for provenance. The result of our explorations is Provenance Aware Condor (PAC), a system that transparently gathers provenance while jobs run in Condor. Transparent provenance gathering requires t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i10.21415